Pairwise sequence alignment - it's all about us!
نویسنده
چکیده
Pairwise alignment is one of the most fundamental tools of bioinformatics and underpins a variety of other, more sophisticated methods of annotation. Pairwise alignment in its most rigorous form uses a method called ‘dynamic programming’, which is highly accurate, but also incredibly costly to compute. In order to align anything other than an exact alphabetic match, the algorithm has to know what it is looking for and how it can evaluate the worth of what it finds. To this end, ‘comparison matrices’ have been created which define a score for every possible match possibility—an effective tally of how well the computational alignment is doing. The software will search for the highest score available. The final score is relevant only with its resulting alignment and cannot be used outside this context. In the case of DNA, comparison values are generated using a simple identity matrix of the type that allows one (positive) score for a correct match within the alignment, and a different (zero or negative) score for a mismatch. A fuller comparison matrix allowing ambiguities, alters these basic values as potential transitions and transversions are taken into account—but essentially there is very little mathematical difference that can be achieved between one alignment and another similar one. Protein matrices, on the other hand, offer a greater breadth of calculation as not only are there five times as many common amino acid residues as there are DNA bases, they incorporate a significant amount of evolutionary information. The most common matrices here are the position accepted mutation (PAM) [1, 2] and BLOSUM [3] comparison tables. The first PAM matrix was created in the late 1970s and relied on noting accepted residue substitutions within protein sequences to produce the PAM 1 table. Subsequent tables in the PAM family have been created by multiplication models based on that first matrix. The greater the multiplication, the higher the number in the PAM series and the greater the number of accepted mutations which have been involved in the proteins used to create the tables, and thus the greater the evolutionary divergence of those proteins. One of the more common matrices—the PAM 250 matrix— represents a subset of proteins of approximately 80% diversity. The BLOSUM matrices were created in the early 90s and relied on the presence of residues within the blocks of conserved regions of related proteins to create the matrix. These blocks can be accessed in the BLOCKS [4, 5] database. There is also a family of BLOSUM matrices which are differentiated with numbers. These numbers, however, represent the minimum percentage identity of the BLOCKS used to create the matrix. The most common matrix of this set is BLOSUM 62, a default setting for many protein alignment applications. It indicates that BLOCKS of at least 62% identity were used in the creation of this matrix. In the case of the BLOSUM matrices, the higher the number connected to the BLOSUM matrix, the smaller the evolutionary divergence between sequences. Once the comparison matrix has been established, the computer can make its own matrix based on the two sequences to be aligned—inserting a ‘score’ for each potential base or residue alignment. However, to allow the computer to score each comparison and select the one match—or run of matches—that gives the highest score would not necessarily yield the best alignment as it ignores biological insertions and
منابع مشابه
gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences
Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...
متن کاملEvaluation of sequence alignments of distantly related sequence pairs with respect to structural similarity.
We evaluate the performance of common substitution matrices with respect to structural similarities. For this purpose, we apply an all-versus-all pairwise sequence alignment on the ASTRAL40 [7] dataset, consisting of 7290 entries with a pairwise sequence identity of at most 40%. Afterwards, we compare the 100 highest scoring sequence alignments to their corresponding structural alignments, whic...
متن کاملUsing Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures
We show how a basic pairwise alignment procedure can be improved to more accurately align conserved structural regions, by using variable, position-dependent gap penalties that depend on secondary structure and by taking the consensus of a number of suboptimal alignments. These improvements, which are novel for structural alignment, are direct analogs of what is possible with normal sequences a...
متن کاملEfficient mapping of genomic sequences to optimize multiple pairwise alignment in hybrid cluster platforms
Multiple sequence alignment (MSA), used in biocomputing to study similarities between different genomic sequences, is known to require important memory and computation resources. Nowadays, researchers are aligning thousands of these sequences, creating new challenges in order to solve the problem using the available resources efficiently. Determining the efficient amount of resources to allocat...
متن کاملEvolutionary Two-Dimensional DNA Sequence Alignment
This article presents a model for DNA sequence alignment. In our model, a finite state automaton writes two-dimensional maps of nucleotide sequences. An evolutionary method for sequence alignment from this representation is proposed. We use HIV as the working example. Experimental results indicate that structural similarities produced by two-dimensional representation of sequences allow us to p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Briefings in bioinformatics
دوره 7 1 شماره
صفحات -
تاریخ انتشار 2006